HW 7 - Bayes Rules 4

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.4     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.1     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(bayesrules)
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

Question 4.1: Match the prior to the description

Label each as strongly or somewhat favoring \(\pi\), above or below:

  1. Beta(1.8,1.8) – centering \(\pi\) on 0.5
  2. Beta(3,2) – somewhat favoring \(\pi\) > 0.5
  3. Beta(1,10) – strongly favoring \(\pi\) < 0.5
  4. Beta(1,3) – somewhat favoring \(\pi\) < 0.5
  5. Beta(17,2) – strongly favoring \(\pi\) > 0.5

Question 4.2: Match the plot to the code

The arguments that correspond to the graph shown is:

alpha = 3, beta = 8, y = 2, n = 4

I’ll check it:

plot_beta_binomial(alpha = 3,beta=8,y=2,n=4)

Yes! A Match!

Question 4.3: Choice of prior: ginko tree leaf drop

A ginkgo tree sheds all of their leaves at the same time after the first frost. Randi thinks that the local ginkgo tree will drop all its leaves next Monday. She asks 5 friends what they think. Identify reasonable Beta priors for each belief.

  1. Ben says that it is really unlikely – Beta(1,20)

This would convey a Beta that is pretty left skewed.

  1. Albert says that he is quite unsure and hates trees. He has no idea – Beta(1,1)

This uniform distribution feels appropriate because he has no idea and presumably no prior knowledge of tree behavior given how he hates trees.

  1. Katie gives it some thought and, based on what happened last year, thinks that there is a very high chance – Beta(50,2)

Looks like Katie has some prior knowledge and put thought into it so I would think she is more certain of her answer, which is right skewed.

  1. Daryl thinks that there is a decent chance, but he is somewhat unsure – Beta(20,9)

This is also right skewed but less certain than Katie.

  1. Scott thinks it probably won’t happen, but he’s somewhat unsure – Beta(9,20)

This shows similar to Daryl but in reverse. That he is left skewed but less certain than Katie.

Practice Different Priors, Different Posteriors

The local ice cream shop is open until it runs out of ice cream for the day. It’s 2pm and Chad wants to pick up an ice cream cone. He asks his coworkers about the chance \(\pi\) that the shop is still open. Their Beta priors are below:

Kimya - Beta(1,2)

Fernando - Beta(0.5,1)

Ciara - Beta(3,10)

Taylor - Beta(2,0.1)

Question 4.4: Choice of prior

Visualize and summarize in words each coworker’s prior understanding of Chad’s chances to satisfy his ice cream craving.

Kimya thinks its unlikely but their not too sure.

plot_beta(1,2)

Fernando thinks its not likely and their somewhat sure.

plot_beta(0.5,1)

Ciara thinks its not very likely and their pretty sure.

plot_beta(3,10)

Taylor thinks its pretty likely and their pretty sure.

plot_beta(2,0.1)

Question 4.5: Simulating the posterior

Chad finds that on 3 of the past 7 days they were still open at 2pm. For each of the coworkers:

  1. Simulate the posterior model
  2. Create a histogram for the simulated posterior
  3. Use the simulation to approximate the posterior mean value of \(\pi\)

Going to simulate the model by first simulating 10,000 values of each Beta prior and then use the binomial model:

Kimya

Beta(1,2), y = 3, n = 7

set.seed(369)

kimya_sim <- data.frame(pi = rbeta(10000,1,2)) |> 
  mutate(y=rbinom(10000,size=7,prob=pi))

kimya_posterior <- kimya_sim |> 
  filter(y==3)

ggplot(kimya_posterior,aes(x=pi)) +
  geom_histogram(bins=20,color="white") + 
  theme_minimal() +
  labs(title="Kimya Posterior")

kimya_posterior |> 
  summarize(mean(pi))
##   mean(pi)
## 1 0.400331

Fernando

set.seed(369)

fernando_sim <- data.frame(pi = rbeta(10000,0.5,1)) |> 
  mutate(y=rbinom(10000,size=7,prob=pi))

fernando_posterior <- fernando_sim |> 
  filter(y==3)

ggplot(fernando_posterior,aes(x=pi)) +
  geom_histogram(bins=20,color="white") + 
  theme_minimal() +
  labs(title="Fernando Posterior")

fernando_posterior |> 
  summarize(mean(pi))
##    mean(pi)
## 1 0.4227631

Ciara

set.seed(369)

ciara_sim <- data.frame(pi = rbeta(10000,3,10)) |> 
  mutate(y=rbinom(10000,size=7,prob=pi))

ciara_posterior <- ciara_sim |> 
  filter(y==3)

ggplot(ciara_posterior,aes(x=pi)) +
  geom_histogram(bins=20,color="white") + 
  theme_minimal() +
  labs(title="Ciara Posterior")

ciara_posterior |> 
  summarize(mean(pi))
##    mean(pi)
## 1 0.3028835

Taylor

set.seed(369)

taylor_sim <- data.frame(pi = rbeta(10000,2,0.1)) |> 
  mutate(y=rbinom(10000,size=7,prob=pi))

taylor_posterior <- taylor_sim |> 
  filter(y==3)

ggplot(taylor_posterior,aes(x=pi)) +
  geom_histogram(bins=20,color="white") + 
  theme_minimal() +
  labs(title="Taylor Posterior")

taylor_posterior |> 
  summarize(mean(pi))
##   mean(pi)
## 1 0.535582

Question 4.6 Identifying the posterior

For each coworker:

  1. identify the exact posterior model of \(\pi\)
  2. calculate the exact posterior mean of \(\pi\)
  3. compare these to the simulation results in the previous exercise

I’m going to use the summarize beta binomial to get the exact posterior model and the mean for each of these coworker’s priors:

Kimya

summarize_beta_binomial(alpha=1,beta=2,y=3,n=7)
##       model alpha beta      mean  mode        var        sd
## 1     prior     1    2 0.3333333 0.000 0.05555556 0.2357023
## 2 posterior     4    6 0.4000000 0.375 0.02181818 0.1477098
plot_beta_binomial(alpha=1,beta=2,y=3,n=7)

The posterior model looks similar to the simulated model for Kimya. The simulated mean was 0.400331 and the exact mean was 0.40000. These are pretty close!

Fernando

summarize_beta_binomial(alpha=0.5,beta=1,y=3,n=7)
##       model alpha beta      mean      mode        var        sd
## 1     prior   0.5    1 0.3333333 1.0000000 0.08888889 0.2981424
## 2 posterior   3.5    5 0.4117647 0.3846154 0.02549627 0.1596755
plot_beta_binomial(alpha=0.5,beta=1,y=3,n=7)

Again, the posterior model looks pretty similar to the simulated model. The mean in the simulated model was 0.4227621 and the exact mean was 0.4117647. This was a bit more different than Kimya’s mean from the simulated mean.

Ciara

summarize_beta_binomial(alpha=3,beta=10,y=3,n=7)
##       model alpha beta      mean      mode        var        sd
## 1     prior     3   10 0.2307692 0.1818182 0.01267963 0.1126039
## 2 posterior     6   14 0.3000000 0.2777778 0.01000000 0.1000000
plot_beta_binomial(alpha=3,beta=10,y=3,n=7)

Also has a similar shape for the posterior of the simulated model and the exact model. The mean for the simulated model was 0.3028835 and the exact mean was 0.30000. Again, a very similar value.

Taylor

summarize_beta_binomial(alpha=2,beta=0.1,y=3,n=7)
##       model alpha beta      mean      mode        var        sd
## 1     prior     2  0.1 0.9523810 1.0000000 0.01462951 0.1209525
## 2 posterior     5  4.1 0.5494505 0.5633803 0.02451036 0.1565579
plot_beta_binomial(alpha=2,beta=0.1,y=3,n=7)

The simulated histogram for Taylor’s model is the most interesting to me with peaks in more spread out places. Overall though, the shape of the results are similar to the posterior I found here as well. The simulated mean is 0.535582 and the exact mean is 0.5494505. Once again, a pretty similar value.

Practice Balancing the data and prior

Question 4.7: What dominates the posterior?

For each situation, identify if the prior or the data has more of an influence or if there is an equal compromise.

  1. Beta(1,4), Y = 8, n = 10 – the data has more influence on the posterior

  2. Beta(20,3), Y = 0, n = 1 – the prior has more influence on the posterior

  3. Beta(4,2), Y = 1, n = 3 – the prior and data have equal influence on the posterior

  4. Beta(3,10), Y = 10, n = 13 – the prior and data have equal influence on the posterior

  5. Beta(20,2), Y = 10, n = 200 – the data has more influence on the posterior

Question 4.8: Visualizing the evolution

Plot each of the above scenarios and compare the scaled likelihoods, priors, and posterior pdfs:

plot_beta_binomial(alpha = 1,beta=3,y=8,n=10) + labs(title="Graph A")

plot_beta_binomial(alpha=20,beta=3,y=0,n=1) + labs(title="Graph B")

plot_beta_binomial(alpha=4,beta=2,y=1,n=3) + labs(title="Graph C")

plot_beta_binomial(alpha=3,beta=10,y=10,n=13) + labs(title="Graph D")

plot_beta_binomial(alpha =20,beta=2,y=10,n=200) + labs(title="Graph E")

These visualizations show that the data has a stronger influence for graphs a and e. The priors have a stronger influence for graphs b. And the prior and data have similar influence for graphs c and d.

Question 4.9: Different data: more or less sure

Let \(\pi\) denote the proportion of people that prefer dogs to cats. The prior is Beta(7,2).

  1. According to the prior, what are reasonable values for \(\pi\)?

Reasonable values should center around 77% (7/9).

plot_beta(7,2)

  1. If you observe that Y = 19 of n = 20 people prefer dogs, how would that change your understanding of \(\pi\)? Both the evolution of the mean and level of certainty.

This survey gives a very very high rate of dog preference (95%) that will increase the mean of the prior, giving it higher density, and increase the certainty of the \(\pi\) values being so high above 75%, making the graph narrower around the high values.

plot_beta_binomial(alpha=7,beta=2,y=19,n=20)

  1. If instead we observe y = 1 of n =20, how would that change the understanding?

This would pretty drastically change the mean \(\pi\) given how counter it is to the prior. The mean would lower and I would guess the certainty would be pretty consistent because its very different from the prior but the data evidence if very strong.

plot_beta_binomial(alpha=7,beta=2,y=1,n=20)

  1. Instead, we observe y = 10 of n = 20. How would you interpret that?

This data is pretty inconclusive. It doesn’t lean strongly towards preference for dogs or cats, but it does differ from the prior. I would guess that this lowers the mean and lowers the certainty, thus expanding the range of values.

plot_beta_binomial(alpha=7,beta=2,y=10,n=20)

This actually keeps the range of values pretty consistent it looks like. Mathematically speaking, a 50% success rate adds the same value to the alpha and the beta in the prior, so that the posterior model is given by Beta(17,12).

Question 4.10: What was the data?

In each situation, identify n and y and then sketch the prior, scaled likelihood, and posterior pdf.

I’m going to use the formula that alpha_posterior = alpha_prior + y and beta_posterior = beta_prior + n - y. To solve for y and n I can transform those to be:

\(y = \alpha_{posterior} - \alpha_{prior}\)

\(n = \beta_{posterior} - \beta_{prior} + y\)

  1. Beta Prior (0.5,0.5); Beta Posterior (8.5,2.5)

y = 8.5 - 0.5 = 8 n = 2.5 - 0.5 + y = 10

plot_beta_binomial(alpha=0.5,beta=0.5,y=8,n=10)

  1. Beta Prior (0.5,0.5); Beta Posterior (3.5,10.5)

y = 3.5 - 0.5 = 3 n = 10.5 - 0.5 + y = 13

plot_beta_binomial(alpha=0.5,beta=0.5,y=3,n=13)

  1. Beta Prior (10,1); Beta Posterior (12,15)

y = 12 - 10 = 2 n = 15 - 1 + y = 16

plot_beta_binomial(alpha=10,beta=1,y=2,n=16)

  1. Beta Prior (8,3); Beta Posterior (15,6)

y = 15 - 8 = 7 n = 6 - 3 + y = 10

plot_beta_binomial(alpha=8,beta=3,y=7,n=10)

  1. Beta Prior (2,2); Beta Posterior (5,5)

y = 5 - 2 = 3 n = 5 - 2 + y = 6

plot_beta_binomial(alpha=2,beta=2,y=3,n=6)

  1. Beta Prior(1,1); Beta Posterior(30,3)

y = 30 - 1 = 29 n = 3 - 1 + y = 31

plot_beta_binomial(alpha = 1,beta=1,y=29,n=31)

Question 4.11: Different Data, uninformative prior

In each situation we have Beta(1,1) but different data. Identify the posterior model and plot them all

Using the same formula above, I can take the prior alpha and beta and use the data to get the posterior alpha and beta. In this case, 1 + Y = alpha_posterior and 1 + (n-Y) = beta_posterior.

  1. Y = 10, n = 13

Beta Posterior (11,4)

plot_beta_binomial(alpha=1,beta=1,y=10,n=13)

  1. Y = 0, n = 1

Beta Posterior (1,2)

plot_beta_binomial(alpha=1,beta=1,y=0,n=1)

  1. Y = 100, n = 130

Beta Posterior (101,31)

plot_beta_binomial(alpha=1,beta=1,y=100,n=130)

  1. Y = 20, n = 120

Beta Posterior (21, 101)

plot_beta_binomial(alpha=1,beta=1,y=20,n=120)

  1. Y = 234, n = 468

Beta Posterior (235,235)

plot_beta_binomial(alpha=1,beta=1,y=234,n=468)

Question 4.12

Repeat exercise 4.11 with a prior of Beta(10,2). Again, using the formula. In this case, 10 + Y = alpha_posterior and 2 + (n-Y) = beta_posterior.

  1. Y = 10, n = 13 – Beta (20,5)

  2. Y = 0, n = 1 – Beta (10,3)

  3. Y = 100, n = 130 – Beta (110,32)

  4. Y = 20, n = 120 – Beta (30, 102)

  5. Y = 234, n = 468 – Beta (244, 236)

plot_beta_binomial(alpha=10,beta=2,y=10,n=13) + labs(title="a")

plot_beta_binomial(alpha=10,beta=2,y=0,n=1) + labs(title="b")

plot_beta_binomial(alpha=10,beta=2,y=100,n=130) + labs(title="c")

plot_beta_binomial(alpha=10,beta=2,y=20,n=120) + labs(title="d")

plot_beta_binomial(alpha=10,beta=2,y=234,n=468) + labs(title="e")

Question 4.13: Bayesian Bummer

We can screw up a Bayesian. A politician specifies the prior understanding of their approval rating by \(\pi \text{ ~ } Unif(0.5,1) \text{ with pdf } f(\pi) = 2 \text{ when } 0.5 =< \pi < 1 \text{, and } f(\pi) = 0 \text{ when } 0 < \pi < 0.5\)

  1. Sketch the prior (by hand)

Politician Prior

  1. Describe the politician’s prior understanding of \(\pi\)

The politician believes that their approval rating has an equal probability of being anywhere between 50% and 100%.

  1. Their aides show them a poll where 0 of 100 people approve of their job performance. Construct a formula for and sketch the posterior pdf of \(\pi\)

Politician Posterior and Formula

  1. Describe the politicians posterior understanding. Explain the mistake the politician made in specifying the prior.

Their posterior combines their previous understanding that the probability is equal for values between 0.5 and 1, and is not existent for under 0.5 with the data that they got a 0 approval rating. This makes it so their prior model is incompatible with the data they were given.

Question 4.14: Challenge: posterior mode

  1. In the beta-binomial setting, show that we can write the posterior mode of \(\pi\) as the weighted average of the prior mode and the observed sample success rate:

\(Mode(\pi) = \frac{\alpha - 1}{\alpha + \beta - 2}, \text{when } \alpha,\beta > 1\)

Mode Formula

  1. To what value does the posterior mode converge as our sample size n increases?

The posterior mode converges to 0 as n increases to infinity. Because we have n in the denominator.

Practice Sequentiality

Question 4.15: One at a time

Let \(\pi\) be the probability of success for some event of interest You place Beta(2,3) prior on \(\pi\). Sequentially update your posterior for \(\pi\) with each new observation. Once again, I will use the above formula for how to use the data to update alpha posterior and beta posterior (adding successes to alpha and failures to beta).

  1. First observation = success.

Beta(3,3)

  1. Second observation = success

Beta(4,3)

  1. Third observation = failure

Beta(4,4)

  1. Fourth observation = success

Beta(5,4)

All of this is equivalent to Y = 3, n = 4. Which would produce a Beta Posterior (5,4)

Question 4.16: Five at a time

Let Beta(2,3) and update the posterior model after each five observations. I will do this using the same technique as above.

  1. Y = 3, n = 5

Beta(5,5)

  1. Y = 1, n = 5

Beta(6,9)

  1. Y = 1, n = 5

Beta(7,13)

  1. Y = 2, n = 5

Beta(9,18)

Question 4.17: Different data, different posteriors

A show company develops a new ad for sneakers. Three employees share the same prior Beta(4,3). Each employee runs a different study with different data. The first employee has Y = 0, n = 1. The second employee has Y = 3, n = 10. The third employee has Y = 20, n = 100.

  1. Sketch the prior with plot_beta and describe the employees’ prior understandings.
plot_beta(4,3)

The employees think that the probability that a user will click on the ad is likely around 60% and there is not a lot of certainty around that value.

  1. Specify the unique posterior model for each employee.

I’ll use the alpha and beta conversion formulas for each of the employee’s observation data. Namely that alpha posterior = alpha prior + y; beta posterior = beta prior + n - y

Employee 1: Beta(4,4)

Employee 2: Beta(7,10)

Employee 3: Beta(24,83)

  1. Plot the prior, likelihood and posterior pdf for each employee
plot_beta_binomial(alpha=4,beta=3,y=0,n=1) + labs(title="Employee 1")

plot_beta_binomial(alpha=4,beta=3,y=3,n=10) + labs(title="Employee 2")

plot_beta_binomial(alpha=4,beta=3,y=20,n=100) + labs(title="Employee 3")

  1. Summarize and compare the employees’ posterior models of \(\pi\)

Employee 1 gathered basically no data, so their posterior looks pretty similar to their prior, but does have a lower mode for \(\pi\).

Employee 2 has a bit more info and the rate is lower than predicted. So there is a bit higher certainty that the posterior is closer to the observed data than the prior model.

Employee 3 ran a large survey with a lot more data and shows a lot of certainty centering around a much lower value for \(\pi\) (closer to 25%)

Question 4.18: Sequential Employees

The above show company brings in a fourth employee. They start with the same Beta(4,3) prior but don’t collect their own data. They get each employees data sequentially.

  1. What’s the posterior model for employee 4 at the end of each day? Again using the formula to update alpha and beta to the prior by incorporating the success rate values.

Day 1 data: y = 0, n = 1

Beta(4,4)

Day 2 data: y = 3, n = 10

Beta(7,11)

Day 3 data: y = 20, n = 100

Beta(27,91)

  1. Sketch the new employee’s prior and sequential posteriors. Describe how their understanding of \(\pi\) evolved over their first three days on the job:
plot_beta(4,3) + labs(title="Prior")

plot_beta(4,4) + labs(title="Day 1")

plot_beta(7,11) + labs(title ="Day 2")

plot_beta(27,91) + labs(title="Day 3")

  1. suppose instead the new employee didn’t update the data until the third day on the job, after getting data from all three employees. Specify the posterior model and compare it to the day three posterior from part a.

Combining all of the employee data gives Y = 0 + 3 + 20 = 23 and n = 1 + 10 + 100 = 111. This will update the model:

alpha posterior = alpha prior + y = 4 + 23 = 27

beta posterior = beta prior + n - y = 3 + 111 - 23 = 91

Beta(27,91)

Question 4.19 Bechdel Test

Analyze \(\pi\), the proportion of films that pass the Bechdel test. Specify the posterior and calculate the posterior mean and mode.

  1. John has a flat Beta(1,1) prior and analyzes movies from the year 1980.
#Import data
data(bechdel,package="bayesrules")

#Analyze the movies from 1980

bechdel_80 <- bechdel |> 
  filter(year=="1980")

#Get a table that calculates the pass and fails from 1980 films

bechdel_80 |> 
  tabyl(binary) |> 
  adorn_totals("row")
##  binary  n   percent
##    FAIL 10 0.7142857
##    PASS  4 0.2857143
##   Total 14 1.0000000

There was a pass rate of 4 (y = 4) of 14 movies (n = 14).

This would provide a posterior Beta(5,11).

Mean = alpha / alpha + beta Mode = alpha - 1 / alpha + beta - 2

Mean = 5 / 5 + 11 = 5 / 16 = 0.3125

Mode = 5 - 1 / 5 + 11 - 2 = 4 / 14 = 0.2857143

  1. The next day, John analyzes from year 1990, building off their analysis from the previous day:
#1990 data
bechdel_90 <- bechdel |> 
  filter(year=="1990")

bechdel_90 |> 
  tabyl(binary) |> 
  adorn_totals("row")
##  binary  n percent
##    FAIL  9     0.6
##    PASS  6     0.4
##   Total 15     1.0

This provides a y of 6 and n = 15. To update the previous Beta(5,11) with this data gives us a new posterior of Beta(11,20).

Mean = 11 / 11 + 20 = 0.3548387

Mode = 11 - 1 / 11 + 20 - 2 = 0.3448276

  1. The third day, John analyzed movies from the year 2000 building off the past two days’ analyses:
#data from year 2000

bechdel_00 <- bechdel |> 
  filter(year=="2000")

bechdel_00 |> 
  tabyl(binary) |> 
  adorn_totals("row")
##  binary  n   percent
##    FAIL 34 0.5396825
##    PASS 29 0.4603175
##   Total 63 1.0000000

This gives y = 29, n = 63. Updating the Beta(11,20) gives a new posterior of Beta(40,54).

Mean = 40 / 40 + 54 = 0.4255319

Mode = 40 - 1 / 40 + 54 - 2 = 0.423913

  1. Jenna also starts her analysis with Beta(1,1) prior but analyzes movies from 1980, 1990, and 2000 all in one day:
#looking at data from the 80s,90s,00s all at once
bechdel_yrs <- bechdel |> 
  filter(year=="1980" | year=="1990" | year=="2000")

bechdel_yrs |> 
  tabyl(binary) |> 
  adorn_totals("row")
##  binary  n  percent
##    FAIL 53 0.576087
##    PASS 39 0.423913
##   Total 92 1.000000

This gives a total of y = 39 and n = 92 for all of those years. This would gives a posterior of Beta(40,54). This matches the posterior we got from the sequential analysis.

The mean and mode can be calculated the same way as in the sequential case.

Mean = 40 / 40 + 54 = 0.4255319

Mode = 40 - 1 / 40 + 54 - 2 = 0.423913

Question 4.20: Bayesian and Frequentist - Sequential Edition

We can use Bayes to sequentially update our understanding of a parameter of interest. How is this different from what a frequentist approach would be? How is it similar?

Frequentist methods would just take the new data presented as a representation of the proportion of movies that pass the bechdel test without taking the prior into account. So each day, the most recent understanding would update with the data regardless of the priors we get, causing a lot more variance in how we see the stat day to day. It is similar in that if a frequentist adds yesterday’s data to make it cummulative, it should give a similar answer between the sequential analysis and the total. I.e., if day one the data they use is 4/14 then day 2 they use 4 + 6 / 14 + 15 = 10 / 29 etc. then the last day will produce the same value as if they took all pieces of data at once.